Special Trees ============= We study here a couple of special trees with useful properties. Binary Search Trees ------------------- What is a Binary Search Tree? A binary tree Carefully constructed for fast lookup of items What is a Key? Part of an item used for searching What is the defining property of a Binary Search Tree? BST Order Property For every node X in the tree Keys in left subtree are smaller than key in X Keys in right subtree are larger than key in X What operations does a Binary Search Tree provide? Add an item Find an item Remove an item For these operations, how do Binary Search Trees compare to Lists? Lists are O(n) BSTs are O(log n) In the average case, how many steps are needed to find an item: In a List of 1,000,000 items? 500,000 In a Binary Search Tree of 1,000,000 items? 20 What other data types are efficiently implemented using BSTs? Set, Map, Priority Queue How do you find a key in a Binary Search Tree? Compare x to root If smaller, repeat in left subtree If larger, repeat in right subtree How do you find the minimum key in a Binary Search Tree? Repeatedly follow left child How do you find the maximum key in a Binary Search Tree? Repeatedly follow right child How do you insert a key in a Binary Search Tree? In place of null pointer where find failed How do you remove a key from a Binary Search Tree? Find node to remove If leaf Remove node If only one child Connect parent to child If two children Replace with smallest node in right subtree Remove smallest node in right subtree Class Exercise Show the result of inserting C, A, D, F, E, B into an initially empty BST. Show the result of removing C, E, D from the previous BST. (a) BSTree.java Code for find, insert, and remove Recall that the cost of add, find and remove is O(log N). What characteristic of the tree affects this cost? Cost is proportional to the height of the tree What is the height of a BST with N nodes? It depends Best vs. worst vs. average cases What is the best-case height of a BST with N nodes? The tree is perfectly balanced The height is O(log N) The cost of find, insert, and remove is O(log N) What is the worst-case height of a BST with N nodes? The tree is like a linked list The height is O(N) The cost of find, insert, and remove is O(N) What is the average-case height of a BST with N nodes? Assume random insertion sequence Average of all insertion orders The average case is 38% worse than the best case The height is still O(log N) The cost of find, insert, and remove is O(log N) What happens if you insert a sorted sequence of items into a BST? Causes worst-case behavior Need to keep tree balanced AVL Trees --------- What is a Balanced BST? A BST with insert and remove algorithms that always keep the tree balanced What are examples of specific types of Balanced BSTs? AVL trees (first balanced BST, we will study) Red-Black trees (often used in practice, we will not study) What is an AVL tree? BST with extra balance property Named after Adelson-Velskii and Landis What is the AVL Balance Property? For any node in the tree, the height of left and right subtrees differ by at most 1 How well does an AVL tree stay balanced? Worst case height is about 1.44 times best case Typical case height is close to log N Find, insert, and remove are O(log n) How do you find a key in an AVL tree? Same as in a regular BST How do you insert a key in an AVL tree? Insert as in a standard BST If necessary, restore balance using rotations How do you know if rebalancing is needed? Look for nodes that violate the AVL property Where do you find the unbalanced nodes that need balancing? Only nodes from root to insert point can be out of balance For insertion, only fix balance at deepest node; this fixes the whole tree What are the four possible cases where a node can be out of balance? left > right, left-left > left-right left > right, left-left < left-right right > left, right-right > right-left right > left, right-right < right-left How do you rebalance when the left child is higher and the left-left grandchild is higher? Single Right Rotation left = root.left; root.left = left.right; left.right = root; return left; How do you rebalance when the right child is higher and the right-right grandchild is higher? Single Left Rotation right = root.right; root.right = right.left; right.left = root; return right; What happens if you try to do a single right rotation when the left child is higher and the left-right grandchild is higher? Single rotate doesn't work Tree is still unbalanced How can you correct this type of imbalance? Solve with 2 single rotations Double Right Rotation root.left = rotateLeft(root.left); root = rotateRight(root); How do you rebalance when the right child is higher and the right-left grandchild is higher? Double Left Rotation root.right = rotateRight(root.right); root = rotateLeft(root); Class Exercise Show the insertion of values 1, 10, 2, 9, 3, 8, 4, 7, 5, 6 into an AVL tree. Show the result of inserting C, A, D, F, E, B into an AVL tree. Show the result of removing C, E, D from the previous tree. Implementing AVL Trees ---------------------- Suppose we wish to implement the Set interface (add/remove/contains) using BSTs Our BSTs must stay balanced using AVL balancing The Iterator will traverse our tree in preorder Our implementation must run successfully and efficiently with some fairly large trees We could write our own code from scratch or we could start with the BST code we have already written Which would be best? Start with BST code We would need to know the height of each node in the tree to implement the AVL balancing. We could write a method to compute the height of a node whenever it is needed or we could store a node's height inside of itself and only recompute the height when you know it changes. Which would be best? Store height in each node Whenever the tree changes due to insert or remove we would need to use rotations to balance nodes that need balancing. We could rebalance as the recursion unwinds from the insert or remove methods or we could leave the regular BST insert and remove methods unchanged and write a separate method that checks every node in the tree to see if it needs to be balanced. Which would be best? Balance when unwinding from recursion Suppose we wrote a method named 'balance' that would balance a node. What would we give as input to 'balance'? A node to be balanced What would be the first step of 'balance'? Compare the heights of the children to determine what balancing is needed What would be the next step of 'balance'? Call the appropriate rotation(s) Update the heights of any nodes that change height What would be returned from 'balance'? A pointer to the new root node of the subtree that has been balanced What would the code for the various rotations be? See above How would we implement a preorder iterator? Would we use recursion? Do not use recursion Use a stack initialized with the root node Pop the node to visit from the stack Push the children of that node on the stack